VARCO-VISION-14B is a powerful English-Korean Vision-Language Model (VLM) that supports image and text input, generates text output, and possesses capabilities for grounding, referencing, and OCR.
Image-to-Text
Transformers Supports Multiple Languages